Replication Package for:
Signaling theory in entrepreneurial fundraising and crowdfunding research

Norbert Steigenberger
Marcel Garz
Thomas Cyron

October 12, 2024




##### INVENTORY #####

- the folder "comments" includes:
  --> subfolder "based on transformers"
      --> "fine_tuned_model_uncertainty": Roberta base model finetuned for uncertainty detection
      --> "predictions.csv": intermediate dataset including Roberta-based predictions of sentiment/uncertainty
      --> "sentiment uncertainty classifiers.ipynb": Jupyter notebook used to create "fine_tuned_model_uncertainty", "predictions.csv", and "wiki_trainingdata.csv"
      --> "wiki.xml": WikiWeasel corpus from https://rgai.inf.u-szeged.hu/node/160
      --> "wiki_trainingdata.csv": WikiWeasel corpus prepared for finetuning
  --> "comments sentiment get.R": script to create bag-of-words sentiment measures
  --> "comments sentiment ordinary.dta: intermediate Stata dataset including bag-of-words sentiment measures
  --> "comments uncertainty ordinary.dta": intermediate Stata dataset including bag-of-words uncertainty measures
  --> "kickstarter_comments.csv": comments from the investigated campaigns, retrieved from kickstarter.com
  --> "list uncertainty words.csv": list of uncertainty words extracted from WikiWeasel corpus
  --> "topic modeling.R": script to implement topic modeling
  --> "topic proportion.dta": intermediate dataset including topic proportions

- the folder "content of updates" includes:
  --> subfolder "merged_feeds": json files -- one for each Kickstarter project in the paper -- that include the content of updates retrieved via the Wayback Machine https://wayback.archive.org/
  --> "content of updates.dta": intermediate Stata dataset including measures of text length, readability, and lexical diversity of update texts
  --> "prepare content.R" script to process the json files, compute content measures, and create "content of updates.dta"

- the folder "daily funding raw data" includes screenshots from Kicktraq on projects' daily funding

- the folder "webrobots" includes:
  --> subfolder "raw": raw data from https://webrobots.io/kickstarter-datasets/
  --> "projects info prepare.do": script to read, process, and filter the webrobots data and create "projects info.dta"
  --> "project info.dta": intermediate Stata dataset including project-level data

- "campaign day dataset analyze.do" implements the analyses and can be used to create the tables and figures in the paper

- "campaign day dataset create.do" uses the different data sources to compile the final analysis dataset

- "campaign day dataset.dta": final analysis dataset

- "daily funding.xls" contains the manually collected daily funding amount of the above projects (typed from the png-files in "daily funding raw data")

- "links.txt" contains a list of URLs to Kickstarter projects in the category "video games" with a funding goal >= 25,000 USD and starting date between 01 Jan 2020 and 30 June 2021 (obtained via Kicktraq)

- "updates.xls" includes manually collected dates of Updates by creators on Kickstarter



##### STEPS TO COMPILE MAIN ANALYSIS DATASET

1) Run "comments/based on transformers/sentiment uncertainty classifiers.ipynb"
2) Run "comments/comments sentiment get.R"
3) Run "comments/topic modeling.R"
4) Run "content of updates/prepare content.R"
5) Run "webrobots/projects info prepare.do"
6) Run "campaign day data create.do"


  














